Background: Literature Based Discovery (LBD) produces more potential hypotheses than can be manually reviewed,\nmaking automatically ranking these hypotheses critical. In this paper, we introduce the indirect association measures\nof Linking Term Association (LTA), Minimum Weight Association (MWA), and Shared B to C Set Association (SBC), and\ncompare them to Linking Set Association (LSA), concept embeddings vector cosine, Linking Term Count (LTC), and\ndirect co-occurrence vector cosine. Our proposed indirect association measures extend traditional association\nmeasures to quantify indirect rather than direct associations while preserving valuable statistical properties.\nResults: We perform a comparison between several different hypothesis ranking methods for LBD, and compare\nthem against our proposed indirect association measures. We intrinsically evaluate each methodâ??s performance using\nits ability to estimate semantic relatedness on standard evaluation datasets. We extrinsically evaluate each methodâ??s\nability to rank hypotheses in LBD using a time-slicing dataset based on co-occurrence information, and another\ntime-slicing dataset based on SemRep extracted-relationships. Precision and recall curves are generated by ranking\nterm pairs and applying a threshold at each rank.\nConclusions: Results differ depending on the evaluation methods and datasets, but it is unclear if this is a result of\nbiases in the evaluation datasets or if one method is truly better than another. We conclude that LTC and SBC are the\nbest suited methods for hypothesis ranking in LBD, but there is value in having a variety of methods to choose from.
Loading....